A replication of data analysis presented in Walker et al (2019) ‘Increasing wildfires threaten historic carbon sink of boreal forest soils’.
Required Packages:
library(curl)
library(ggplot2)
library(lmtest)
library(effects)
library(broom)
Required Data:
s <- curl('https://raw.githubusercontent.com/vfrench-397/AN588_ReplicationAnalysis/main/Data/Radiocarbon_site_data.csv')
site.data <- read.csv(file = s, header = TRUE, stringsAsFactors = TRUE) #general site classifications for each plot (moisture class, stand age and stand age class)
head(site.data)
## burned_site_plot latitude longitude elevation slope aspect moisture_johnstone
## 1 SS33-3A 60.93776 -117.3727 227.547 0 -9999 mesic
## 2 SS33-4A 60.94248 -117.3788 229.535 0 -9999 mesic-subhygric
## 3 ZF104-110A 63.10914 -113.9700 293.283 0 -9999 subhygric
## 4 ZF104-63B 63.06745 -113.7327 266.376 0 -9999 mesic-subhygric
## 5 ZF104-83B 63.04578 -114.0243 270.940 0 -9999 mesic
## 6 ZF20-10A 61.80457 -116.7246 238.162 0 -9999 mesic-subxeric
## moisture_class stand_age_rings stand_age_class residual_sol_depth burn_depth
## 1 Moist 90 old-burned 13.55 9.53
## 2 Wet 105 old-burned 19.40 6.83
## 3 Wet 56 young-burned 24.90 10.00
## 4 Wet 40 young-burned 10.70 7.96
## 5 Moist 54 young-burned 5.95 13.76
## 6 Moist 84 old-burned 7.95 5.31
## prefire_sol_depth bg_c_residual bg_c_combusted prefire_bg_c tree_density
## 1 23.08 6970.04 3821.72 10791.76 3.13
## 2 26.23 8754.96 2929.55 11684.50 0.60
## 3 34.90 6958.69 2674.51 9633.20 0.83
## 4 18.66 3309.66 3199.05 6508.71 0.88
## 5 19.71 3216.49 4133.52 7350.01 0.63
## 6 13.26 2242.22 2092.95 4335.16 1.93
## tree_c_prefire
## 1 2938.43
## 2 3125.83
## 3 549.25
## 4 163.68
## 5 1129.18
## 6 2414.58
d <- curl('https://raw.githubusercontent.com/vfrench-397/AN588_ReplicationAnalysis/main/Data/Soil_radiocarbon_data.csv')
soil.data <- read.csv(file = d, header = TRUE, stringsAsFactors = TRUE) #radioactive C data for site soil depth increments and atmosphere at the time of stand establishment
head(soil.data)
## burned_site_plot latitude longitude sample_transect sample_monolith
## 1 SS33-3A 60.93776 -117.3727 0 1
## 2 SS33-3A 60.93776 -117.3727 0 1
## 3 SS33-3A 60.93776 -117.3727 0 1
## 4 SS33-3A 60.93776 -117.3727 12 2
## 5 SS33-3A 60.93776 -117.3727 12 2
## 6 SS33-3A 60.93776 -117.3727 12 2
## site_sample_transect soil_depth_increment increment_location
## 1 SS33-3A.0 2 TOP
## 2 SS33-3A.0 12 BOTTOM
## 3 SS33-3A.0 1 TOP
## 4 SS33-3A.12 20 BOTTOM
## 5 SS33-3A.12 1 TOP
## 6 SS33-3A.12 2 TOP
## bottom_sol_material field_sample_id uciams_id Delta14C std_deviation
## 1 mineral RC86 190142 81.2 2.1
## 2 mineral RC90 190133 40.9 2.4
## 3 mineral RC85 190141 71.4 2.1
## 4 mineral RC30 192176 46.3 1.5
## 5 mineral RC25 192199 37.9 2.0
## 6 mineral RC26 192175 111.2 1.5
## soil_class_pre.post_bomb stand_age_rings stand_year X14C_yr_stand_age
## 1 post 90 1924 -12.88
## 2 post 90 1924 -12.88
## 3 post 90 1924 -12.88
## 4 pre 90 1924 -12.88
## 5 post 90 1924 -12.88
## 6 post 90 1924 -12.88
## stand_age_pre.post_bomb
## 1 pre
## 2 pre
## 3 pre
## 4 pre
## 5 pre
## 6 pre
Boreal regions have extremely thick soil organic layers due to their relatively slow rates of decomposition. Meaning they can be extremely effective in storing and retaining large amounts of carbon, preventing C from being emitted to the atmosphere and contributing to the formation of greenhouse gases.
When wildfires sweep through boreal regions, they combust the exposed soil organic layer, releasing stored C to the atmosphere. Due to the thick nature of these soil organic layers (SOLs), combustion is restricted to a proportion of the entire SOL. The portion that is unaffected by the most recent burning event is expected to retain ‘legacy carbon’.
With increasing frequency and severity of wildfire events as a result of global warming, these legacy carbon pools are under threat.
The goal of the original paper was to assess legacy C loss in sites across the Northwest Territory, Canada by identifying sites that possessed legacy C before and retained their legacy C pools after a burning event in 2014. Sites present with Legacy C were identified (through radiocarbon dating) as sites with any organic soil C that is older than the stand age (time since previous burning event).
In 32 plots \(\Delta^{14}C\) (parts per thousand difference between sample carbon-14:carbon-12 and the international standard ratio) was measured at several soil depth increments. The 32 plots were assigned a moisture class (based on their topography, drainage, presence of permafrost and their soil class) and a stand age class (young < 60 years since previous burning event, old > 70 years, based on the historic fire return interval of North American northwest coast boreal ecosystems).
Soil samples were dated (based on \(\Delta^{14}C\) values) relative to a known historic peak C level from nuclear bomb testing in the area (further known as bomb peak).
Site establishment was dated based on tree rings.
The original paper conducted Firth’s bias reduced logistic regression to assess whether the presence of legacy C could be predicted from stand age and the prefire SOL depth (which is acting as a proxy for moisture class), and whether the combustion of legacy C could be predicted from the stand age and the proportion of the SOL burned (which is acting as a proxy for fire severity). They used Firth’s logistic regression to reduce problems associated with highly predictive covariates, common in generalized linear model regressions using smaller sample sizes. I attempted the Firth’s bias reduced logistic regression using the {logistf} package as well as the standard logistic regression using the glm() function and neither produced significant results so I kept in the standard regression code to condense the size of this report.
The authors used AIC to test for interactions between covariates in which they found that the interaction term between predictor variables could be dropped for the final model. I found a similar result running log-likelihood tests so I did not replicate the AIC model selection.
Finally, the original authors explored mixed modeling using individual fires as a random effect but found the mixed modelling did not produce significantly different results than the logistic regression and so again I did not replicate this analysis.
To identify plots that possess legacy C, the \(\Delta^{14}C\) values of the basal organic soil layer were compared to \(\Delta^{14}CO_2\) values of the atmosphere during the year of stand establishment. Legacy C was considered present if the base of the SOL was older than the year of stand establishment (meaning \(\Delta^{14}C\) lower (having decayed for longer) than \(\Delta^{14}CO_2\)). Therefore the first step is to extract basal SOL measurements from the larger data set.
basal.layer <- soil.data[soil.data$increment_location == 'BOTTOM', ]
basal.layer <- basal.layer[duplicated(basal.layer$burned_site_plot) == FALSE,]
Next we determine and label the presence/absence of legacy carbon for each plot using a for loop with an embedded if/else conditions.
basal.layer$LC <- NULL
for (i in 1:nrow(basal.layer)) {
if (basal.layer$Delta14C[i] < basal.layer$X14C_yr_stand_age[i]) {
basal.layer$LC[i] <- 'Present'
}
else {
basal.layer$LC[i] <- 'Absent'
}
}
sum(basal.layer$LC == 'Present') #number of sites determined to possess legacy C
## [1] 19
For visualization, we must manipulate the data, combining the carbon measurements of the basal soil layer and the atmosphere at time of stand establishment into a single variable, for the ggplot object.
basal.carbon <- basal.layer[,c('burned_site_plot', 'soil_depth_increment', 'Delta14C', 'soil_class_pre.post_bomb', 'stand_age_pre.post_bomb', 'LC')] #extract Delta14C and other associated site information
basal.carbon$type <- 'Soil.Base' #designate delta 14 C as class soil base
names(basal.carbon) <- c('Plot', 'Depth', 'Delta.14.C', 'Soil.Class','Stand.Class','LC', 'Location')
basal.stand.age <- basal.layer[,c('burned_site_plot', 'soil_depth_increment', 'X14C_yr_stand_age', 'soil_class_pre.post_bomb', 'stand_age_pre.post_bomb', 'LC')]
basal.stand.age$type <- 'Stand.Age' #designate atm C as a class stand age
names(basal.stand.age) <- c('Plot', 'Depth', 'Delta.14.C', 'Soil.Class', 'Stand.Class','LC', 'Location')
legacy.carbon <- rbind(basal.carbon, basal.stand.age)
Divide the legacy carbon data set into pre and post bomb peak sites.
legacy.carbon.pre <- legacy.carbon[legacy.carbon$Stand.Class == 'pre',]
legacy.carbon.post <- legacy.carbon[legacy.carbon$Stand.Class == 'post',]
Then we can plot basal layer \(\Delta^{14}C\) against atmospheric \(\Delta^{14}CO_2\) at time of stand establishment (pre and post bomb peak)
Figure 2 b and c from original paper
ggplot(legacy.carbon.pre, aes(x= Location, y = Delta.14.C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = Plot, linetype = LC), color = 'black') + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon')
Caption: Basal \(\Delta^{14}C\) (green circles) compared to atmospheric concentrations of \(^{14}C\) at the time of stand establishment (red triangles) in sites dated pre-bomb peak (pre 1966). Dashed lines represent sites where Legacy C is present. Solid lines represent sites where Legacy C was absent.
ggplot(legacy.carbon.post, aes(x= Location, y = Delta.14.C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = Plot, linetype = LC), color = 'black') + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = 'dashed')
Caption: Basal \(\Delta^{14}C\) (green circles) compared to atmospheric concentrations of \(^{14}C\) at the time of stand establishment (red triangles) in sites dated post-bomb peak (post 1966). Dashed lines represent sites where Legacy C is present. Solid lines represent sites where Legacy C was absent.
We can also visualize the relationship between C and our expected predictor variables.
#Creating a data set that has both our predictor and response variables so we can plot the relationships and call on it during the regression.
log.r.data <- cbind(basal.layer, site.data)
log.r.data <- log.r.data[,-c(21,22)]
ggplot(data = log.r.data, aes(x = LC, y = stand_age_rings)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Legacy C v. Stand Age', x = 'Legacy Carbon', y = 'Stand Age (yr)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none") + geom_hline(aes(yintercept = 60), linetype = 'dashed')
Caption: Legacy C class (presence or absence) plotted against the sites stand age (time since stand establishment, or time since previous burning event). The dashed horizontal line at 60 years represents the cut off point for age class (young or old). The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). Legacy C presence (red) was more likely in younger stands where Legacy C was absent (green) in only older stands.
We can see from comparing the basal layer C levels of old and young plots that they have similar \(\Delta^{14}C\) levels which indicates they both underwent similar cycles of accumulation/combustion in previous burn cycles. Therefore the large proportion of young stands harboring legacy C could be due to the young soils not yet re-accumulating their C stocks.
ggplot(data = log.r.data, aes(x = stand_age_class, y = Delta14C)) + geom_boxplot(aes(color = stand_age_class)) + theme_bw() + labs(title = 'Stand Age Class v. Delta 14 C', x = 'Stand Age', y = 'Delta 14 C (parts per thousand)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")
Caption: Stand age class (old = green and young = red) plotted against basal \(\Delta^{14}C\) levels to establish historic burn cycle effects on C combustion and accumulation. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).
ggplot(data = log.r.data, aes(x = LC, y = prefire_sol_depth)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Legacy Carbon v. Prefire SOL Depth ', x = 'Legacy Carbon', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")
Caption: Legacy C class (presence = red and absence = green) plotted against prefire SOL depth (cm). The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).
You can see Legacy C is present in all plots with prefire SOL depth > 30 cm which is indicative of wetter ecosystems. Using prefire SOL depth as a proxy for moisture class is appropriate, as we can see clear trends across prefire SOL depth when plotted against mositure class.
ggplot(data = log.r.data, aes(x = moisture_class, y = prefire_sol_depth)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Moisture Class v. Prefire SOL depth', x = 'Moisture Class', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred', 'darkblue'))
Caption: Moisture class (dry, moist and wet) plotted against the prefire SOL depth (cm). Each moisture class subset by the presence (red) or absence (green) of Legacy C. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). The prefire SOL depths are distinct for each moisture class indicating prefire SOL depth is an appropriate proxy for moisture class.
We will generate a generalized linear model for this data because we cannot assume a normal distribution of our binary response variable or its residuals and therefore cannot fit the assumptions of a general linear model. We use a logistic regression, a form of a generalized linear model, because our response variable is binary (presence or absence of legacy C) so we must use the logit link function to calculate the natural log of the odds ratio between our two possible outcomes and use that output as our response variable. When preparing the data to be placed into the glm() function we need to make sure our variables are factored to avoid fitting errors when running the regression.
log.r.data$LC <- as.factor(log.r.data$LC)
levels(log.r.data$LC)
## [1] "Absent" "Present"
To evaluate the model significance we can compare model fit between a full, a reduced and a null model. In our case the full model will include the interaction terms between the predictor variables stand age and SOL depth. The reduced model will remove only the interaction term between the two predictor variables and the null model will be an intercept only model.
full.presence <- glm(data = log.r.data, formula = LC ~ prefire_sol_depth*stand_age_rings, family = binomial)
summary(full.presence)
##
## Call:
## glm(formula = LC ~ prefire_sol_depth * stand_age_rings, family = binomial,
## data = log.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6056 -0.9864 0.5256 0.7918 1.8665
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.714911 2.336077 1.162 0.2452
## prefire_sol_depth -0.057961 0.109514 -0.529 0.5966
## stand_age_rings -0.042048 0.025303 -1.662 0.0965 .
## prefire_sol_depth:stand_age_rings 0.001444 0.001190 1.214 0.2249
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.230 on 31 degrees of freedom
## Residual deviance: 34.501 on 28 degrees of freedom
## AIC: 42.501
##
## Number of Fisher Scoring iterations: 5
reduced.presence <- glm(data = log.r.data, formula = LC ~ prefire_sol_depth + stand_age_rings, family = binomial)
summary(reduced.presence)
##
## Call:
## glm(formula = LC ~ prefire_sol_depth + stand_age_rings, family = binomial,
## data = log.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6482 -1.0558 0.4442 0.8946 1.7336
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.169088 1.172339 0.144 0.8853
## prefire_sol_depth 0.079485 0.044768 1.775 0.0758 .
## stand_age_rings -0.014321 0.008787 -1.630 0.1031
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.230 on 31 degrees of freedom
## Residual deviance: 36.289 on 29 degrees of freedom
## AIC: 42.289
##
## Number of Fisher Scoring iterations: 4
null.presence <- glm(data = log.r.data, formula = LC ~ 1, family = binomial)
summary(null.presence)
##
## Call:
## glm(formula = LC ~ 1, family = binomial, data = log.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.342 -1.342 1.021 1.021 1.021
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.3795 0.3599 1.054 0.292
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.23 on 31 degrees of freedom
## Residual deviance: 43.23 on 31 degrees of freedom
## AIC: 45.23
##
## Number of Fisher Scoring iterations: 4
We can compare the models using a log likelihood test which utilizes a ratio of the log-likelihoods as the test statistic and utilizes the Chi-squared distribution to calculate the p-values. Therefore we can run a log-likelihood test by arguing the two models to the anova() function and additionally including test = 'Chisq' argument.
anova(reduced.presence, full.presence, test = 'Chisq')
## Analysis of Deviance Table
##
## Model 1: LC ~ prefire_sol_depth + stand_age_rings
## Model 2: LC ~ prefire_sol_depth * stand_age_rings
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 29 36.289
## 2 28 34.501 1 1.7878 0.1812
Here the chi-squared p value is not significant so the inclusion of the interaction term does not significantly decrease deviance and we can resume with the reduced model.
We can also run a log-likelihood test using the lrtest() from the {lmtest} package
lrtest(reduced.presence, full.presence)
## Likelihood ratio test
##
## Model 1: LC ~ prefire_sol_depth + stand_age_rings
## Model 2: LC ~ prefire_sol_depth * stand_age_rings
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 3 -18.145
## 2 4 -17.251 1 1.7878 0.1812
Here the higher log likelihood value indicates a greater model fit, but the chi-squared p value is not significant so even though the full model has a ‘better’ fit it is not significantly better. Therefore, again we can proceed with utilizing the reduced model for our analysis.
Next, we must compare the accepted model(the reduced model) against the null model
anova(null.presence, reduced.presence, test = 'Chisq')
## Analysis of Deviance Table
##
## Model 1: LC ~ 1
## Model 2: LC ~ prefire_sol_depth + stand_age_rings
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 31 43.230
## 2 29 36.289 2 6.9407 0.03111 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here the chi-squared p value is less than alpha (.05) meaning the fuller model is associated with less residual deviance and therefore is a better fit! We can reject the null hypothesis that the removal of the predictor variables does not result in a loss of fit and proceed with utilizing the reduced model for our analysis.
Our slope estimate (\(\beta1\)) now represents the associated change in the response variable (which if we remember is the natural log of the odds ratio) with an increase in the predictor variable of 1 unit.
The coefficient/estimate for prefire SOL depth was positive. Therefore increasing prefire SOL depth results in an increased likelihood of legacy C being present. For every one cm of prefire SOL depth, the probability of legacy being present increases by 0.079485. Where the estimate for stand age is negative indicating every year the stand ages the likelihood of legacy C being present decreased by 0.014321. Even though neither of these relationships showed a significant p-value.
In order to get the actual odds ratio we must back transform the estimates using the reverse of the link function. We can find the inverse of the logit link function by pulling it from the model.
fam <- family(reduced.presence)
rev.logit <- fam$linkinv #eta function
SOL.depth.change <- rev.logit(reduced.presence$coefficients[2])
stand.age.change <- rev.logit(reduced.presence$coefficients[3])
SOL.depth.change
## prefire_sol_depth
## 0.5198608
stand.age.change
## stand_age_rings
## 0.4964197
1 unit increase in SOL depth results in a 52 % increase in the likelihood of Legacy C presence 1 unit increase in stand age results in a 49% decrease in the likelihood of Legacy C presence
To test our null hypothesis that the response variable (Legacy C presence) and our predictor variables have no relationship, we can calculate the Wald statistic (similar to a t-statistic) for each predictor variable and compare them to the z distribution.
# Calculating the Wald Statistic for SOL depth
modelresults <- tidy(reduced.presence)
wald <- modelresults$estimate[2]/modelresults$std.error[2]
p <- 2 * (1 - pnorm(wald)) # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 0.07581763
#Wald statistic for stand age
modelresults <- tidy(reduced.presence)
wald <- modelresults$estimate[3]/modelresults$std.error[3]
p <- 2 * (1 - pnorm(wald)) # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 1.896879
Again neither predictor variable seems to show a significant p-value.
Finally, we can calculate the confidence intervals around our estimates
CI <- confint.default(reduced.presence, level = 0.95) # this function returns CIs based on standard errors instead of based on log-likelihood
CI
## 2.5 % 97.5 %
## (Intercept) -2.128654035 2.466829843
## prefire_sol_depth -0.008258719 0.167229045
## stand_age_rings -0.031542971 0.002900075
#Calculating the CIs for the actual odds ratios
SOL.depth.changeCI <- rev.logit(CI[2, ]) #for prefire SOL depth
stand.age.changeCI <- rev.logit(CI[3, ]) #for stand age
SOL.depth.changeCI
## 2.5 % 97.5 %
## 0.4979353 0.5417101
stand.age.changeCI
## 2.5 % 97.5 %
## 0.4921149 0.5007250
Before we create the ggplot object we can transform the factored response variable (presence) into a binary value for easier data plotting.
log.r.data$LC.bin <- NULL
for (i in 1:nrow(log.r.data)) {
if (log.r.data$LC[i] == 'Present') {
log.r.data$LC.bin[i] <- 1
}
else {
log.r.data$LC.bin[i] <- 0
}
}
Although we ran a model with multiple predictor variables, it can be helpful to visualize the predicted likelihood of legacy C presence by each separate predictor variable.
Therefore, we have to generate a new model for each variable.
glm.sol.depth <- glm(data= log.r.data, formula = LC ~ prefire_sol_depth, family = 'binomial')
summary(glm.sol.depth)
##
## Call:
## glm(formula = LC ~ prefire_sol_depth, family = "binomial", data = log.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5716 -1.1258 0.4911 1.0939 1.3937
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.09037 0.88393 -1.234 0.2174
## prefire_sol_depth 0.06993 0.04077 1.715 0.0864 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.230 on 31 degrees of freedom
## Residual deviance: 39.342 on 30 degrees of freedom
## AIC: 43.342
##
## Number of Fisher Scoring iterations: 4
glm.stand.age <- glm(data = log.r.data, formula = LC ~ stand_age_rings, family = 'binomial')
summary(glm.stand.age)
##
## Call:
## glm(formula = LC ~ stand_age_rings, family = "binomial", data = log.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4450 -1.2695 0.7322 0.9323 1.5312
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.650387 0.916089 1.802 0.0716 .
## stand_age_rings -0.012385 0.008052 -1.538 0.1240
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.230 on 31 degrees of freedom
## Residual deviance: 40.635 on 30 degrees of freedom
## AIC: 44.635
##
## Number of Fisher Scoring iterations: 4
Then we have to generate a new range of values of our predictor variables for which we will predict LC presence.
new.sol.depth <- as.data.frame(seq(min(log.r.data$prefire_sol_depth),max(log.r.data$prefire_sol_depth), length.out = 32))
names(new.sol.depth) <- 'prefire_sol_depth' #must rename the column so the predict() function can match the model to the new data
new.stand.age <- as.data.frame(seq(min(log.r.data$stand_age_rings),max(log.r.data$stand_age_rings), length.out = 32))
names(new.stand.age) <- 'stand_age_rings'
We can now predict the LC presence probabilities.
prediction.sol.depth <- cbind(new.sol.depth,predict(glm.sol.depth, newdata = new.sol.depth, type = 'link', se = TRUE))
prediction.stand.age <- cbind(new.stand.age ,predict(glm.stand.age, newdata = new.stand.age, type = 'link', se = TRUE))
and generate the upper and lower confidence intervals for each prediction (using 1.96 because it is the approx critical value for 95% CI). Since we argued the response type should be in the link scale we must back transform the CIs (and the fit when plotting) to plot the values on the appropriate response scale.
prediction.sol.depth$LL <- rev.logit(prediction.sol.depth$fit - 1.96 * prediction.sol.depth$se.fit)
prediction.sol.depth$UL <- rev.logit(prediction.sol.depth$fit + 1.96 * prediction.sol.depth$se.fit)
head(prediction.sol.depth)
## prefire_sol_depth fit se.fit residual.scale LL UL
## 1 6.340000 -0.6470421 0.6602462 1 0.1255244 0.6563432
## 2 7.799355 -0.5449953 0.6126501 1 0.1485791 0.6583160
## 3 9.258710 -0.4429484 0.5673067 1 0.1743841 0.6612731
## 4 10.718065 -0.3409016 0.5248001 1 0.2026991 0.6654565
## 5 12.177419 -0.2388548 0.4858755 1 0.2330496 0.6711655
## 6 13.636774 -0.1368080 0.4514602 1 0.2647022 0.6787545
prediction.stand.age$LL <- rev.logit(prediction.stand.age$fit - 1.96 * prediction.stand.age$se.fit)
prediction.stand.age$UL <- rev.logit(prediction.stand.age$fit + 1.96 * prediction.stand.age$se.fit)
head(prediction.stand.age)
## stand_age_rings fit se.fit residual.scale LL UL
## 1 19 1.415075 0.7790426 1 0.4720670 0.9498840
## 2 25 1.340766 0.7370696 1 0.4740507 0.9418830
## 3 31 1.266457 0.6959197 1 0.4756329 0.9327963
## 4 37 1.192148 0.6557479 1 0.4767373 0.9225432
## 5 43 1.117839 0.6167454 1 0.4772701 0.9110610
## 6 49 1.043529 0.5791484 1 0.4771156 0.8983168
Finally, plot the predicted probabilities and their 95% CI alongside the actual data
Figure 3.a and b from original paper
p <- ggplot(data = prediction.sol.depth, aes(x= prefire_sol_depth, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.sol.depth, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Prefire SOL depth (cm)", y ="Pr(Present)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = log.r.data, aes(x = prefire_sol_depth, y = LC.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1)
p
Caption: Predicted probability of Legacy C presence for prefire SOL depth. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values
p <- ggplot(data = prediction.stand.age, aes(x= stand_age_rings, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.stand.age, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Stand Age (yr)", y ="Pr(Present)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = log.r.data, aes(x = stand_age_rings, y = LC.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1) + geom_vline(aes(xintercept = 60), linetype = 'dashed')
p
Caption: Predicted probability of Legacy C presence for stand age. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values. The dashed line shows the cutoff for young or old stand age classification.
Here we will determine the surface soil establishment date relative to bomb peak by comparing \(\Delta^{14}C\) levels at the soil surface to \(\Delta^{14}C\) levels at soil depth increment of 2 cm. In natural states, depth should have less C because it is older (more C has decayed & surface layers accumulate vertically). Therefore, if C levels are higher at the surface than at the 2cm depth then the surface is dated as pre bomb peak. If the soil surface has less C than the 2cm depth, the surface layer is determined to be established post bomb peak. These classifications will be important for when we determine legacy C combustion later in the analysis.
surface.layer <- soil.data[soil.data$increment_location == 'TOP',] #Extract surface layer measurements
surface.layer$soil_depth_increment <- as.factor(surface.layer$soil_depth_increment)
Assigning stand age class (young; < 60 years or old > 70 years) based on the tree ring dates.
surface.layer$Age <- NULL
for (i in 1:nrow(surface.layer)) {
if (surface.layer$stand_age_rings[i] < 60) {
surface.layer$Age[i] <- 'Young'
}
else {
surface.layer$Age[i] <- 'Old'
}
}
We can plot the \(\Delta^{14}C\) at both soil increments (1 and 2 cm) to determine the relative soil surface dates. We can plot the pre-bomb and post-bomb sites separately to show the relationship of \(\Delta^{14}C\) between both soil increments more clearly.
#First, divide into pre and pot bomb peak data.
surface.layer.pre <- surface.layer[surface.layer$soil_class_pre.post_bomb == 'pre', ]
surface.layer.post <- surface.layer[surface.layer$soil_class_pre.post_bomb == 'post', ]
Figure 2.d and e from original paper
ggplot(surface.layer.pre, aes(x = soil_depth_increment, y = Delta14C)) + geom_point(aes(shape = soil_depth_increment, color = soil_depth_increment), size = 3) + geom_line(aes(group = site_sample_transect, linetype = Age)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = 'Soil Depth', y = 'Delta 14 Carbon (parts per thousand)', color = 'Soil Depth', shape = 'Soil Depth') + scale_x_discrete(labels = c('Surface','2 cm'), breaks = c(1,2))
Caption: Surface \(\Delta^{14}C\) (green circles) compared to 2cm depth \(\Delta^{14}C\) (red triangles) in sites where the soil surface was dated pre-bomb peak (pre 1966). Dashed lines represent young sites. Solid lines represent old sites.
ggplot(surface.layer.post, aes(x = soil_depth_increment, y = Delta14C)) + geom_point(aes(shape = soil_depth_increment, color = soil_depth_increment), size = 3) + geom_line(aes(group = site_sample_transect, linetype = Age)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = 'Soil Depth', y = 'Delta 14 Carbon (parts per thousand)', color = 'Soil Depth', shape = 'Soil Depth') + scale_x_discrete(labels = c('Surface','2 cm'), breaks = c(1,2))
Caption: Surface \(\Delta^{14}C\) (green circles) compared to 2cm depth \(\Delta^{14}C\) (red triangles) in sites where the soil surface was dated post-bomb peak (post 1966). Dashed lines represent young sites. Solid lines represent old sites.
To assess if legacy C was combusted during the burning event the \(\Delta^{14}C\) values of the soil surface samples were compared to atmospheric \(\Delta^{14}CO_2\) values at the time of stand establishment (only in plots where Legacy C was considered present pre burn). Legacy C was considered combusted if the soil surface was older than the stand age.
Therefore legacy C was combusted in the plot IF:
Soil surface C levels were lower than atmospheric stand age \(CO_2\) levels in sites where the stand age was pre bomb peak AND the soil surface was pre bomb peak.
The stand age was dated post bomb peak AND the soil surface was dated pre bomb peak
The soil surface C levels were higher than atmospheric stand age \(CO_2\) levels in sites where the stand age was post bomb peak AND the soil surface was post bomb peak.
Note: legacy C was not combusted if the stand age was pre bomb peak and the surface was post bomb peak.
First we can extract the soil surface \(\Delta^{14}C\) values.
top.surface.layer<- surface.layer[surface.layer$soil_depth_increment == 1,]
top.surface.layer <- top.surface.layer[duplicated(top.surface.layer$burned_site_plot) == FALSE,]
Then extract plots where Legacy C was present.
top.surface.layer$LP <- basal.layer$LC
legacy.plots <- top.surface.layer[top.surface.layer$LP == 'Present',]
Then we must identify and label combustion status for each plot.
legacy.plots$LC <- NULL
for (i in 1:nrow(legacy.plots)) {
if (legacy.plots$stand_age_pre.post_bomb[i] == 'pre' & legacy.plots$soil_class_pre.post_bomb[i] == 'pre'){
if (legacy.plots$Delta14C[i] < legacy.plots$X14C_yr_stand_age[i]){
legacy.plots$LC[i] <- 'Combusted'
}
else {
legacy.plots$LC[i] <- 'Not.Combusted'
}
}
else if (legacy.plots$stand_age_pre.post_bomb[i] == 'post' & legacy.plots$soil_class_pre.post_bomb[i] == 'pre') {
legacy.plots$LC[i] <- 'Combusted'
}
else if (legacy.plots$stand_age_pre.post_bomb[i] == 'pre' & legacy.plots$soil_class_pre.post_bomb[i] == 'post') {
legacy.plots$LC[i] <- 'Not.Combusted'
}
else if (legacy.plots$stand_age_pre.post_bomb[i] == 'post' & legacy.plots$soil_class_pre.post_bomb[i] == 'post') {
if (legacy.plots$Delta14C[i] < legacy.plots$X14C_yr_stand_age[i]){
legacy.plots$LC[i] <- 'Not.Combusted'
}
else {
legacy.plots$LC[i] <- 'Combusted'
}
}
}
First we must manipulate the data for the ggplot object. Again combining our C measurements and assigning them a class.
surface.carbon <- legacy.plots[,-17]
surface.carbon$Location <- 'Soil.Surface'
colnames(surface.carbon)[12] <- 'Delta14C'
surface.stand.age <- legacy.plots[,-12]
surface.stand.age$Location <- 'Stand.Age'
colnames(surface.stand.age)[16] <- 'Delta14C'
legacy.combustion <- rbind(surface.carbon, surface.stand.age)
Then we can plot the soil surface C levels against the atmospheric \(CO_2\) at the time of stand establishment for all of the different site types.
#subset the legacy plot data set by site type.
legacy.combustion.pre.pre <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'pre' & legacy.combustion$soil_class_pre.post_bomb == 'pre',]
legacy.combustion.post.pre <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'post' & legacy.combustion$soil_class_pre.post_bomb == 'pre',]
legacy.combustion.pre.post <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'pre' & legacy.combustion$soil_class_pre.post_bomb == 'post',]
legacy.combustion.post.post <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'post' & legacy.combustion$soil_class_pre.post_bomb == 'post',]
Figure 2.f-i from original paper
ggplot(legacy.combustion.pre.pre, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))
Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated pre-bomb peak and the soil surface was dated pre-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.
ggplot(legacy.combustion.post.pre, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))
Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated post-bomb peak and the soil surface was dated pre-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.
ggplot(legacy.combustion.pre.post, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))
Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated pre-bomb peak and the soil surface was dated post-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.
ggplot(legacy.combustion.post.post, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon')
Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated post-bomb peak and the soil surface was dated post-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.
Similar to when we were assessing legacy C presence or absence, we can look at the combustion status relationship with our expected predictor variables.
First we must extract the appropriate plots (where Legacy C was present)
combustion.r.data <- log.r.data[log.r.data$LC == 'Present',]
combustion.r.data$combustion <- legacy.plots$LC #add the combustion class to the new logistic regression data
ggplot(data = combustion.r.data, aes(x = combustion, y = prefire_sol_depth)) + geom_boxplot(aes(color = combustion)) + theme_bw() + labs(title = 'Combustion v. Prefire SOL depth ', x = 'Legacy Carbon', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")
Caption: Legacy C (combusted = green and not combusted = red) plotted against prefire SOL depth. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).
ggplot(data = combustion.r.data, aes(x = combustion, y = stand_age_rings)) + geom_boxplot(aes(color = combustion)) + theme_bw() + labs(title = 'Combustion v. Stand Age ', x = 'Legacy Carbon', y = 'Stand Age (yr)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none") + geom_hline(aes(yintercept = 60), linetype = 'dashed')
Caption: Legacy C (combusted = green and not combusted = red) plotted against stand age. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). The horizontal dashed line at 60 years represents the cutoff for stand age class (young/old)
From both of these plots we can see no evidence for combustion in old sites, and drier and younger sites combusted more frequently.
To test our hypotheses for Legacy C combustion, we again must use a generalized linear model using the log link function because our response variable of combustion status is also binary. In this case our model will look at the predictor variables of stand age and the proportion of the soil organic layer burned (as a proxy for fire severity)
So first we must factor our response variable and calculate the proportion of the soil organic layer burned from the burn depth and the prefire sol depth.
combustion.r.data$combustion <- factor(combustion.r.data$combustion, levels = c('Not.Combusted', 'Combusted'))
levels(combustion.r.data$combustion) #important to factor in the correct order for glm model interpretation
## [1] "Not.Combusted" "Combusted"
combustion.r.data$proportion_sol_burned <- combustion.r.data$burn_depth/combustion.r.data$prefire_sol_depth
Then we can move onto model selection
Again we are creating a full model that includes the predictors interaction term, a reduced model without an interaction term and a null model (intercept only model)
full.combustion <- glm(data = combustion.r.data, formula = combustion ~ proportion_sol_burned * stand_age_rings, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(full.combustion)
##
## Call:
## glm(formula = combustion ~ proportion_sol_burned * stand_age_rings,
## family = binomial, data = combustion.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.26155 -0.07383 0.00000 0.01511 1.67247
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 20.0540 16.9873 1.181 0.238
## proportion_sol_burned -13.5919 19.1339 -0.710 0.477
## stand_age_rings -0.6923 0.5185 -1.335 0.182
## proportion_sol_burned:stand_age_rings 0.7773 0.6005 1.294 0.196
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 21.9007 on 18 degrees of freedom
## Residual deviance: 7.5533 on 15 degrees of freedom
## AIC: 15.553
##
## Number of Fisher Scoring iterations: 10
reduced.combustion <- glm(data = combustion.r.data, formula = combustion ~ proportion_sol_burned + stand_age_rings, family = binomial)
summary(reduced.combustion)
##
## Call:
## glm(formula = combustion ~ proportion_sol_burned + stand_age_rings,
## family = binomial, data = combustion.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.39200 -0.48671 -0.03575 0.11308 1.79539
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.78147 2.99380 -1.263 0.2066
## proportion_sol_burned 12.52782 7.38223 1.697 0.0897 .
## stand_age_rings -0.05758 0.03494 -1.648 0.0993 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 21.901 on 18 degrees of freedom
## Residual deviance: 10.708 on 16 degrees of freedom
## AIC: 16.708
##
## Number of Fisher Scoring iterations: 7
null.combustion <- glm(data = combustion.r.data, formula = combustion ~ 1, family = binomial)
summary(null.combustion)
##
## Call:
## glm(formula = combustion ~ 1, family = binomial, data = combustion.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7815 -0.7815 -0.7815 0.4263 1.6340
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.030 0.521 -1.976 0.0481 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 21.901 on 18 degrees of freedom
## Residual deviance: 21.901 on 18 degrees of freedom
## AIC: 23.901
##
## Number of Fisher Scoring iterations: 4
And compare the models using a log-likelihood test.
anova(reduced.combustion, full.combustion, test = 'Chisq')
## Analysis of Deviance Table
##
## Model 1: combustion ~ proportion_sol_burned + stand_age_rings
## Model 2: combustion ~ proportion_sol_burned * stand_age_rings
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 16 10.7085
## 2 15 7.5533 1 3.1552 0.07569 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here the chi-squared p value is not significant so the inclusion of the interaction term does not significantly decrease deviance and we can resume with the reduced model.
We can also look at the results of lrtest() from the {lmtest} package
lrtest(reduced.combustion, full.combustion)
## Likelihood ratio test
##
## Model 1: combustion ~ proportion_sol_burned + stand_age_rings
## Model 2: combustion ~ proportion_sol_burned * stand_age_rings
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 3 -5.3542
## 2 4 -3.7767 1 3.1552 0.07569 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here the higher log likelihood value indicates a greater model fit, but again the chi-squared p value is not significant so even though the full model has a ‘better’ fit it is not significantly better. Therefore, again we can proceed with utilizing the reduced model for our analysis.
Next, we must compare the accepted model(the reduced model) against the null model
anova(null.combustion, reduced.combustion, test = 'Chisq')
## Analysis of Deviance Table
##
## Model 1: combustion ~ 1
## Model 2: combustion ~ proportion_sol_burned + stand_age_rings
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 18 21.901
## 2 16 10.709 2 11.192 0.003712 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here the chi-squared p value is less than alpha (.05) meaning the fuller model is associated with less residual deviance and therefore is a better fit! We can reject the null hypothesis that the removal of the predictor variables does not result in a loss of fit and proceed with utilizing the reduced model for our analysis.
Our slope estimate (\(\beta1\)) again represents the associated change in the response variable (which if we remember is the natural log of the odds ratio) with an increase in the predictor variable of 1 unit.
The coefficient/estimate for the proportion of the soil organic layer was positive. Therefore increasing proportions of the SOL burned results in an increased likelihood of legacy C being combusted. For every one unit of proportion of SOL burned, the probability of legacy C being combusted increases by 12.52782. Where the estimate for stand age is negative indicating every year the stand ages, the likelihood of legacy C being combusted decreases by 0.05758. Even though neither of these relationships showed a significant p-value.
In order to get the actual odds ratio we must back transform the estimates using the reverse of the link function.
prop.sol.burned.change <- rev.logit(reduced.combustion$coefficients[2])
stand.age.change <- rev.logit(reduced.combustion$coefficients[3])
prop.sol.burned.change
## proportion_sol_burned
## 0.9999964
stand.age.change
## stand_age_rings
## 0.4856079
1 unit increase in proportion of SOL burned results in a 1% decrease in the likelihood of Legacy C combustion 1 unit increase in stand age results in a 51% decrease in the likelihood of Legacy C combustion
To test our null hypothesis that the response variable (Legacy C combustion) and our predictor variables have no relationship, we can calculate the Wald statistic (similar to a t-statistic) for each predictor variable and compare them to the z distribution.
#Calculating the Wald Statistic for proportion of SOL burned
modelresults <- tidy(reduced.combustion)
wald <- modelresults$estimate[2]/modelresults$std.error[2]
p <- 2 * (1 - pnorm(wald)) # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 0.08969244
#for stand age
modelresults <- tidy(reduced.combustion)
wald <- modelresults$estimate[3]/modelresults$std.error[3]
p <- 2 * (1 - pnorm(wald)) # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 1.900684
Again neither predictor variable seems to show a significant p-value.
Finally, we can calculate the confidence intervals around our estimates
CI <- confint.default(reduced.combustion, level = 0.95) # this function returns CIs based on standard errors instead of based on log-likelihood
CI
## 2.5 % 97.5 %
## (Intercept) -9.6492143 2.08628045
## proportion_sol_burned -1.9410968 26.99672747
## stand_age_rings -0.1260621 0.01089314
#Calculating the CIs for the actual odds ratios
prop.sol.burned.changeCI <- rev.logit(CI[2, ]) #for prefire SOL depth
stand.age.changeCI <- rev.logit(CI[3, ]) #for stand age
prop.sol.burned.changeCI
## 2.5 % 97.5 %
## 0.1255274 1.0000000
stand.age.changeCI
## 2.5 % 97.5 %
## 0.4685261 0.5027233
Transform the factored combustion variable into a binary value for the ggplot object
combustion.r.data$combust.bin <- NULL
for (i in 1:nrow(combustion.r.data)) {
if (combustion.r.data$combustion[i] == 'Combusted') {
combustion.r.data$combust.bin[i] <- 1
}
else {
combustion.r.data$combust.bin[i] <- 0
}
}
To visualize the predicted likelihood of legacy C combustion by each separate predictor variable we must first generate a new model for each variable.
glm.sol.burned <- glm(data= combustion.r.data, formula = combustion ~ proportion_sol_burned, family = 'binomial')
summary(glm.sol.burned)
##
## Call:
## glm(formula = combustion ~ proportion_sol_burned, family = "binomial",
## data = combustion.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4942 -0.6229 -0.3927 0.3403 2.0983
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.327 2.059 -2.101 0.0356 *
## proportion_sol_burned 6.293 3.454 1.822 0.0685 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 21.901 on 18 degrees of freedom
## Residual deviance: 17.498 on 17 degrees of freedom
## AIC: 21.498
##
## Number of Fisher Scoring iterations: 5
glm.stand.age <- glm(data = combustion.r.data, formula = combustion ~ stand_age_rings, family = 'binomial')
summary(glm.stand.age)
##
## Call:
## glm(formula = combustion ~ stand_age_rings, family = "binomial",
## data = combustion.r.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1644 -0.8727 -0.3323 0.4042 2.2285
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.23090 1.30256 0.945 0.345
## stand_age_rings -0.03154 0.01979 -1.594 0.111
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 21.901 on 18 degrees of freedom
## Residual deviance: 17.318 on 17 degrees of freedom
## AIC: 21.318
##
## Number of Fisher Scoring iterations: 6
Then generate a new range of values of our predictor variables for which we will predict LC combustion.
new.sol.burned <- as.data.frame(seq(min(combustion.r.data$proportion_sol_burned),max(combustion.r.data$proportion_sol_burned), length.out = 32))
names(new.sol.burned) <- 'proportion_sol_burned'
new.stand.age <- as.data.frame(seq(min(combustion.r.data$stand_age_rings),max(combustion.r.data$stand_age_rings), length.out = 32))
names(new.stand.age) <- 'stand_age_rings'
Finally we can predict the combustion status for the generated predictor variables.
prediction.sol.burned <- cbind(new.sol.burned,predict(glm.sol.burned, newdata = new.sol.burned, type = 'link', se = TRUE))
prediction.stand.age <- cbind(new.stand.age ,predict(glm.stand.age, newdata = new.stand.age, type = 'link', se = TRUE))
and generate the upper and lower confidence intervals for the fitted values.
prediction.sol.burned$LL <- rev.logit(prediction.sol.burned$fit - 1.96 * prediction.sol.burned$se.fit)
prediction.sol.burned$UL <- rev.logit(prediction.sol.burned$fit + 1.96 * prediction.sol.burned$se.fit)
head(prediction.sol.burned)
## proportion_sol_burned fit se.fit residual.scale LL UL
## 1 0.1799147 -3.194867 1.474844 1 0.002270320 0.4245377
## 2 0.1999821 -3.068592 1.411628 1 0.002913786 0.4251174
## 3 0.2200494 -2.942317 1.349013 1 0.003734564 0.4259849
## 4 0.2401168 -2.816042 1.287085 1 0.004779021 0.4271827
## 5 0.2601841 -2.689767 1.225949 1 0.006104368 0.4287614
## 6 0.2802515 -2.563492 1.165730 1 0.007780511 0.4307819
prediction.stand.age$LL <- rev.logit(prediction.stand.age$fit - 1.96 * prediction.stand.age$se.fit)
prediction.stand.age$UL <- rev.logit(prediction.stand.age$fit + 1.96 * prediction.stand.age$se.fit)
head(prediction.stand.age)
## stand_age_rings fit se.fit residual.scale LL UL
## 1 19.00000 0.63168865 0.9816617 1 0.2154522 0.9279586
## 2 24.77419 0.44958614 0.8927017 1 0.2141467 0.9001850
## 3 30.54839 0.26748363 0.8101150 1 0.2107615 0.8647492
## 4 36.32258 0.08538112 0.7360500 1 0.2046832 0.8217192
## 5 42.09677 -0.09672139 0.6733248 1 0.1952206 0.7725902
## 6 47.87097 -0.27882389 0.6253610 1 0.1817506 0.7204880
Finally we can plot the predicted values and their 95% CI alongside the actual data
Figure 3.c and d from original paper
p <- ggplot(data = prediction.sol.burned, aes(x= proportion_sol_burned, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.sol.burned, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Proportion SOL burned", y ="Pr(Combustion)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = combustion.r.data, aes(x = proportion_sol_burned, y = combust.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1)
p
Caption: Predicted probability of Legacy C combustion for the proportion of SOL burned. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values.
p <- ggplot(data = prediction.stand.age, aes(x= stand_age_rings, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.stand.age, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Stand Age (yr)", y ="Pr(Combustion)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = combustion.r.data, aes(x = stand_age_rings, y = combust.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1) + geom_vline(aes(xintercept = 60), linetype = 'dashed')
p
Caption: Predicted probability of Legacy C combustion for stand age. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values. The dashed line represents the cutoff (60 years) for old vs young stand age classification.
So overall I was able to reproduce the exact figures and reproduce the trends similarly to the original data analysis. As stand age increased, the likelihood of legacy C presence and combustion both decreased. Legacy C combustion was more likely as the proportion of the SOL burned and the presence of legacy C became more likely as the prefire SOL depth increased. These trends indicate that more frequent fire intervals and increased fire severity under climate warming are threats to boreal forests ability to act as C sinks and counteract some of the C emissions contributing to greenhouse gas formation.
Where I strayed was in the statistical significance of my generalized linear model results. As stated in the introduction, the original authors were able to obtain significant results for all of their analyses. They did indicate that they used Firth’s logistic regression using the {logistf} package instead of the standard glm() function in the {lmtest} package to account for highly predictive covariates that are common in generalized linear models using smaller sample sizes. Yet when I tried running the analysis under that package the p-values still showed no significance.